Comparing of SGML documents

نویسنده

  • MohammadTaghi Hajiaghayi
چکیده

Documents can be represented as structures with a hierarchial arrangement of text and non-text nodes, where nodes are labeled by category names such as paragraph and section. Representing documents this way is a natural consequence of using the Standard Generalized Markup Language(SGML) to encode text documents which has many applications in different areas. There are many circumstances in which one structured document is a variant of another and we need to compare them to find the relationships (editing changes) between them. One simple way to do that is to model SGML documents by ordered labeled trees and then find a mapping between them. In this proposal we consider this problem deeply, and mention some solutions from the literature. Also, we introduce a research area which consists of a better approach for modeling the SGML documents and related problems. Finally, we mention some methods for attacking to these problems. keywords: SGML, Comparing of documents, Ordered Labeled Trees, Tree-to-tree correction, Directed Labeled Graphs. 1 Motivation and Significance of the problem Documents can be represented as structures with a hierarchial arrangement of text and non-text nodes, where nodes are labeled by category names such as paragraph and section. Representing documents this way is a natural consequence of using the Standard Generalized Markup Language(SGML) to encode text documents. SGML or other variants such as HTML are widely used and even documents that are not simple hierarchies can be represented using SGML. One simple way to model the documents represented in SGML or HTML are trees with labeled nodes where the left to right ordering of the offspring of a node is significant. We call this tree an ordered labeled tree. However, there are other sophisticated ways to model SGML documents which we consider later. There are many circumstances in which one structured document is a variant of another. For example, there may be several manuscript or printed versions of an existing text; there may be several translations of a text; there may be several machine-readable forms of a text produced for different audiences(e.g. a text with minimal apparatus for students and a more extensive apparatus for researchers) or finally there may be interest in maintaining several versions of a document that is being produced in a cooperative manner by several authors. In all of these circumstances, we need to compare different variants of a document and find the relationships (editing changes) among them and in most of the cases these variants have a hypertext structure such as SGML. Finding the editing changes also can be used is other applications like:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Complementary Approaches to Representing Differences Between Structured Documents

Structured documents Documents can be represented as structures with a hierarchical arrangement of text and non-text nodes, where nodes are labelled by category names such as “paragraph” and “section”. Representing documents this way is a natural consequence of using the Standard Generalized Markup Language (SGML) to encode the content and form of documents [10, 11, 7]. SGML is widely used. HTM...

متن کامل

On the Interchangeability of SGML and ODA

SGML and ODA are international standards for the markup and interchange of electronic documents. These standards are incompatible, in the sense that in general a document encoded using SGML cannot be used directly in an ODA-based system, and vice versa. We first describe these two standards, and suggest criteria under which a bridge between the two standards could be evaluated. We then evaluate...

متن کامل

Extending SGML to Accommodate Database Functions: A Methodological Overview

* Partially supported by US Dept. of Education award number P200A502367 and NSF Research and Infrastructure grant, award number NSF CDA-9303189. Abstract A method for augmenting an SGML document repository with database functionality is presented. SGML [ISO 8879, 1986] has been widely accepted as a standard language for writing text with added structural information that gives the text greater ...

متن کامل

Processing SGML Documents

SGML (Standard Generalized Markup Language) is an ISO Standard that specifies a language for document representation. The main idea behind SGML is to strictly separate the structure and contents of a document from the processing of that document. This results in application-independent and thus reusable documents. To gain the full benefit of this approach, tools are needed to support a wide ran...

متن کامل

Docbase - a Database Environment for Structured Documents

Standard Generalized Markup Language (SGML) has been widely accepted as a standard for document representation. The strength of SGML lies in the fact that it embeds logical structural information in documents while preserving a human-readable form. This structural information in SGML documents allows processing of these documents using database techniques. SGML facilitates this goal by providin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001